library(tidyverse)
library(asreml)Online License checked out Mon Feb 2 11:50:40 2026
Previously: More Complex Mixed Models
Our goal today is to fill out our capabilities with asreml / mixed-models.
Several additional layers of modelling options in asreml:
Often, the exact row-column design (spatial layout) of field trials varies from year-to-year.
We already talked about some capacity of mixed-models to account for heterogeneous error variance among trials.
However, real fields can present gradients or other patterns that are not foreseeable when designing experiments. For example, variation in soil water holding capacity or pH, nutrients, etc. Those factors cause spatial correlation patterns between the errors that may not be accounted for by the experimental blocking effects.
If unaccounted for, these kinds of phenomena will bias our conclusions about the genetic units in the experiment.
So, what are our options?
Option 1: By default we model errors as independent and identically distributed: . Basically, ignore variation within field.
For this model, the residual structure, with regards to the field dimensions looks like this:
Option 2: We can include main-effects in the model for range and row (field dimensions), as random post-hoc blocking factors.
Option 3 and our focus for today: In this approach, a correlation matrix is applied to the residuals such that errors are correlated with the degree of correlation proportional to spatial proximity. Autocorrelation can be applied to one or both spatial dimensions. The R matrix should look something like this:
is a correlation matrix for the columns
is a correlation matrix for the rows
They are square and symmetric matrices (i.e. row-by-row and col-by-col).
Row matrix has dimension . Column matrix has .
and are auto-correlation parameters that get estimated when we fit the model with REML. The exponents on cause correlation to drop-off to zero with distance between plots.
In Isik, Holland, and Maltecca (2017) (Chapter 7), the example of a 3 row by 4 column design is given:
An “AR1 x AR1” model means a model with autoregressive correlation structure for both the ROW and COLUMN effects.
Model selection procedures include LRTs and AIC/BIC should be used to decide whether and which spatial approach to use.
library(tidyverse)
library(asreml)Online License checked out Mon Feb 2 11:50:40 2026
alt<-read.csv(here::here("data","19-25CC_ALT_DATA_ALL_2025-08-23.csv"))
# My example data chunk
# This time expanding to include 2 sites, 2 years
# Still small, but suits my examples
# Keep using your own!
alny_alt<-alt %>%
filter(site %in% c("AL","NY"),
year %in% c(24,25))Let’s look for some spatial patterns to model
alny_alt %>%
group_by(site,site.year) %>%
summarize(range=max(range),
row=max(row))`summarise()` has grouped output by 'site'. You can override using the
`.groups` argument.
# A tibble: 4 × 4
# Groups: site [2]
site site.year range row
<chr> <chr> <int> <int>
1 AL 24AL 8 7
2 AL 25AL 5 12
3 NY 24NY 6 10
4 NY 25NY 6 12
4 site.years
4 different layouts
alny_alt %>%
ggplot(.,aes(x=site.year,fill=rep,y=biomass.1)) + geom_boxplot()Warning: The following aesthetics were dropped during statistical transformation: fill.
ℹ This can happen when ggplot fails to infer the correct grouping structure in
the data.
ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
variable into a factor?
Let’s make some plots. Do you see any trends or patterns across the spatial dimensions?
alny_alt %>% mutate(range=as.character(range)) %>%
ggplot(.,aes(x=range,fill=range,y=biomass.1)) + geom_boxplot() + facet_grid(site+year~.,scales='free')alny_alt %>% mutate(row=factor(row,levels=1:13)) %>%
ggplot(.,aes(x=row,fill=row,y=biomass.1)) + geom_boxplot() + facet_grid(site+year~.,scales='free')My attempt to make a heatmap of the spatial variation in biomass from one site.year:
alny_alt %>%
filter(site.year=="24NY") %>%
select(row,range,biomass.1) %>%
spread(range,biomass.1) %>% as.matrix %>%
image(.)ny24_alt<-alny_alt %>%
filter(site.year=="24NY")
ny24_alt<-ny24_alt %>%
# Make factors before modeling
mutate(entry=as.factor(entry),
rep=as.factor(paste0(site.year,"-",rep)),
row=as.factor(row),
range=as.factor(range)) %>%
arrange(range,row)For our current purpose, let’s analyze just one site-year.
iid_error <- asreml(biomass.1~rep,
random=~entry,
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -115.3087 15.93264 58 11:50:41
2 -110.9788 10.87602 58 11:50:41
3 -107.8596 7.527075 58 11:50:41
4 -106.9027 6.132323 58 11:50:41
5 -106.7701 5.607674 58 11:50:41
6 -106.7693 5.566663 58 11:50:41
summary(iid_error)$varcomp component std.error z.ratio bound %ch
entry 13.002783 4.257152 3.054339 P 0
units!R 5.566663 1.435833 3.876958 P 0
row_random <- asreml(biomass.1~rep,
random=~entry+row,
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -116.2221 15.36010 58 11:50:41
2 -111.1389 10.21223 58 11:50:41
3 -107.5665 6.758174 58 11:50:41
4 -106.4198 5.267610 58 11:50:41
5 -106.2339 4.669468 58 11:50:41
6 -106.2314 4.595406 58 11:50:41
# summary(row_random)$varcomp
lucid::vc(row_random) effect component std.error z.ratio bound %ch
row 0.8572 1.047 0.82 P 0.5
entry 13.63 4.34 3.1 P 0.1
units!R 4.595 1.376 3.3 P 0
lrt(iid_error, row_random)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
row_random/iid_error 1 1.0759 0.1498
rng_random <- asreml(biomass.1~rep,
random=~entry+range,
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -115.1688 15.15870 58 11:50:41
2 -110.6454 10.07005 58 11:50:41
3 -107.4475 6.824721 58 11:50:41
4 -106.4917 5.568092 58 11:50:41
5 -106.3596 5.138133 58 11:50:41
6 -106.3581 5.121793 58 11:50:41
Warning in asreml(biomass.1 ~ rep, random = ~entry + range, data = ny24_alt):
Some components changed by more than 1% on the last iteration
# summary(rng_random)$varcomp
lucid::vc(rng_random) effect component std.error z.ratio bound %ch
range 1.024 1.485 0.69 P 1.2
entry 12.42 4.071 3.1 P 0
units!R 5.122 1.391 3.7 P 0
lrt(iid_error, rng_random)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
rng_random/iid_error 1 0.82235 0.1822
# this should be identical to the "iid" residuals model, just allow me to plot residual variogram
iid_error1<-asreml(biomass.1~rep,
random=~entry,
residual=~id(range):id(row),
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -115.3087 15.93264 58 11:50:41
2 -110.9788 10.87602 58 11:50:41
3 -107.8596 7.527075 58 11:50:41
4 -106.9027 6.132323 58 11:50:41
5 -106.7701 5.607674 58 11:50:41
6 -106.7693 5.566663 58 11:50:41
plot(varioGram(iid_error1))ar1_rng <- asreml(biomass.1~rep,
random=~entry,
residual=~ar1(range):id(row),
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -116.0934 16.55381 58 11:50:41
2 -110.8165 10.75185 58 11:50:41
3 -106.7631 7.113229 58 11:50:41
4 -105.5400 5.790797 58 11:50:41
5 -105.4022 5.333902 58 11:50:41
6 -105.4018 5.316984 58 11:50:41
plot(varioGram(ar1_rng))lrt(iid_error1, ar1_rng)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
ar1_rng/iid_error1 1 2.7351 0.04908 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rng)$aic[1] 216.8035
attr(,"parameters")
[1] 3
lucid::vc(ar1_rng) effect component std.error z.ratio bound %ch
entry 14.08 4.343 3.2 P 0
range:row!R 5.317 1.414 3.8 P 0
range:row!range!cor 0.3668 0.1953 1.9 U 0
Genetic variance went up!
ar1_row <- asreml(biomass.1~rep,
random=~entry,
residual=~id(range):ar1(row),
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -115.1729 16.07438 58 11:50:41
2 -110.1702 10.71814 58 11:50:41
3 -106.7327 7.515932 58 11:50:41
4 -105.8954 6.338057 58 11:50:41
5 -105.8080 5.905261 58 11:50:41
6 -105.8078 5.876783 58 11:50:41
plot(varioGram(ar1_row))lrt(iid_error1, ar1_row)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
ar1_row/iid_error1 1 1.9231 0.08276 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_row)$aic[1] 217.6155
attr(,"parameters")
[1] 3
lucid::vc(ar1_row) effect component std.error z.ratio bound %ch
entry 12.64 4.049 3.1 P 0
range:row!R 5.877 1.629 3.6 P 0
range:row!row!cor 0.3326 0.2124 1.6 U 0.1
ar1_rngrow <- asreml(biomass.1~rep,
random=~entry,
residual=~ar1(range):ar1(row),
data=ny24_alt)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -115.9548 16.69992 58 11:50:41
2 -109.7624 10.53414 58 11:50:41
3 -105.2070 7.207152 58 11:50:41
4 -104.3140 6.167376 58 11:50:41
5 -104.2257 5.754424 58 11:50:41
6 -104.2252 5.727410 58 11:50:41
plot(varioGram(ar1_rngrow))lrt(iid_error1, ar1_rngrow)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
ar1_rngrow/iid_error1 2 5.0881 0.03168 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(iid_error1)$aic[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rngrow)$aic[1] 216.4504
attr(,"parameters")
[1] 4
lucid::vc(ar1_rngrow) effect component std.error z.ratio bound %ch
entry 13.67 4.159 3.3 P 0.1
range:row!R 5.727 1.733 3.3 P 0
range:row!range!cor 0.3867 0.1894 2 U 0
range:row!row!cor 0.3624 0.2175 1.7 U 0.1
summary(iid_error1)$aic[1] 217.5386
attr(,"parameters")
[1] 2
summary(ar1_rngrow)$aic[1] 216.4504
attr(,"parameters")
[1] 4
summary(ar1_rng)$aic[1] 216.8035
attr(,"parameters")
[1] 3
The units term is sometimes called the “nugget effect” or the “measurement error”.
Add a plot-level (one level for each experimental unit) random-effect to the model in addition to the spatial residual term.
Separates residual effects into two parts. portion independent among plots and another part for correlated errors associated with spatial distance.
The correlated-effects might be b/c soil fertility, water holding capacity, etc.
The measurement error (nugget) due might be due to ‘permanent environment’ or field position effects; something constant, like having a cow sit on a particular plot.
ar1_rngrow_nugget <- asreml(biomass.1~rep,
random=~entry+idv(units),
residual=~ar1(range):ar1(row),
data=ny24_alt, maxit=90)ASReml Version 4.2 02/02/2026 11:50:41
LogLik Sigma2 DF wall
1 -116.0673 15.30780 58 11:50:41 ( 1 restrained)
2 -111.4385 11.90274 58 11:50:41 ( 1 restrained)
3 -106.3191 7.968799 58 11:50:41 ( 1 restrained)
4 -104.4891 6.477581 58 11:50:41 ( 1 restrained)
5 -104.2402 5.901006 58 11:50:41 ( 1 restrained)
6 -104.2244 5.699143 58 11:50:41 ( 1 restrained)
7 -104.2171 5.208599 58 11:50:41
8 -104.2012 4.864135 58 11:50:41
9 -104.1918 4.682068 58 11:50:41
10 -104.1867 4.608293 58 11:50:41
11 -104.1842 4.592215 58 11:50:41
12 -104.1829 4.596334 58 11:50:41
13 -104.1822 4.606527 58 11:50:41
14 -104.1817 4.617851 58 11:50:41
15 -104.1815 4.628659 58 11:50:42
lrt(iid_error1, ar1_rngrow_nugget)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
ar1_rngrow_nugget/iid_error1 3 5.1756 0.05671 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrt(ar1_rngrow, ar1_rngrow_nugget)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
ar1_rngrow_nugget/ar1_rngrow 1 0.087489 0.3837
summary(ar1_rngrow)$aic[1] 216.4504
attr(,"parameters")
[1] 4
summary(ar1_rngrow_nugget)$aic[1] 218.363
attr(,"parameters")
[1] 5
summary(ar1_rngrow_nugget)$varcomp component std.error z.ratio bound %ch
entry 13.5962748 4.1745145 3.2569716 P 0.2
units!units 1.6728314 2.2348234 0.7485296 P 0.8
range:row!R 4.6286587 2.7798010 1.6651043 P 0.0
range:row!range!cor 0.5803943 0.3654835 1.5880180 U 0.5
range:row!row!cor 0.5908623 0.4029625 1.4662961 U 0.9
Adding the “nugget” made a poorer fit model.
Has better application with long-term perennial plots, for example.
Spatially correlated residual model introduces some complications for estimating heritability.
AR models may provide a better fit to data but can inflate the residual variance.
For “traditional” variance component-based heritability estimates, asreml provides a handy function called vpredict().
# Handy asreml function: vpredict
## Calculates functions of the variance components and their std. errors
#iid_error$vparameters
vpredict(iid_error,H2~V1/(V1+V2)) Estimate SE
H2 0.7002246 0.09417944
# Handy asreml function: vpredict
## Calculates functions of the variance components and their std. errors
vpredict(ar1_rngrow,H2~V1/(V1+V2+V3)) Estimate SE
H2 0.6909303 0.09436592
The alternative is , the mean reliability (see previous lesson).
# extract BLUPs and make them into a data.frame
blups<-summary(ar1_rngrow,coef=T)$coef.random %>%
as.data.frame %>%
rownames_to_column(var = "entry") %>%
filter(grepl("^entry_",entry)) %>%
mutate(entry=gsub("entry_","",entry,fixed = T))
# extract Vg (variance component associated with "entry" effect)
sigma2g<-summary(ar1_rngrow)$varcomp["entry","component"]
# Compute PEV and REL for each BLUP
blups<-blups %>%
mutate(PEV=std.error^2,
REL=1-(PEV/sigma2g))
# Mean REL is our estimate of heritability:
mean(blups$REL)[1] 0.8437074
You can use the dsum() function like we did previously, along with auto-regressive structures. This allows to model each trial’s spatial error separately.
# Make factors before modeling
alny_alt<-alny_alt %>%
mutate(entry=as.factor(entry),
site.year=as.factor(site.year),
site=as.factor(site),
# unless you are expecting / modeling a linear trend in year
# convert it to factor
year=as.factor(year),
# explicitly nest values of rep in site.year
# review explicit vs. implicit nesting
rep=as.factor(paste0(site.year,"-",rep)),
row=as.factor(row),
range=as.factor(range)) %>%
# ORDER IS IMPORTANT HERE
arrange(site.year,range,row)
ar1_alny <- asreml(biomass.1~1+site.year,
random=~entry+rep,
residual=~dsum(~ar1(range):ar1(row)|site.year),
data=alny_alt, maxiter=90)ASReml Version 4.2 02/02/2026 11:50:42
LogLik Sigma2 DF wall
1 -635.6699 1.0 244 11:50:42 ( 2 restrained)
2 -613.1544 1.0 244 11:50:42
3 -600.7582 1.0 244 11:50:42
4 -594.2571 1.0 244 11:50:42
5 -591.6365 1.0 244 11:50:42 ( 1 restrained)
6 -590.8221 1.0 244 11:50:42 ( 1 restrained)
7 -590.6447 1.0 244 11:50:42 ( 1 restrained)
8 -590.6117 1.0 244 11:50:42 ( 1 restrained)
9 -590.6052 1.0 244 11:50:42
10 -590.6039 1.0 244 11:50:42
11 -590.6036 1.0 244 11:50:42
Warning in asreml(biomass.1 ~ 1 + site.year, random = ~entry + rep, residual =
~dsum(~ar1(range):ar1(row) | : Some components changed by more than 1% on the
last iteration
plot(varioGram(ar1_alny))summary(ar1_alny)$varcomp component std.error z.ratio bound %ch
rep 5.530377e-06 NA NA B NA
entry 2.279569e+00 1.06556154 2.1393123060 P 0.2
site.year_24AL!R 2.424695e+02 57.11406757 4.2453547994 P 0.0
site.year_24AL!range!cor 4.045302e-01 0.13000414 3.1116719832 U 0.0
site.year_24AL!row!cor 3.286445e-01 0.13785021 2.3840692566 U 0.0
site.year_24NY!R 1.418477e+01 3.02084593 4.6956268241 P 0.0
site.year_24NY!range!cor -1.492073e-01 0.15674867 -0.9518887772 U 0.1
site.year_24NY!row!cor 5.768768e-05 0.15138292 0.0003810712 U 366.8
site.year_25AL!R 1.967632e+02 37.65342648 5.2256384660 P 0.0
site.year_25AL!range!cor 1.447034e-03 0.15382791 0.0094068388 U 4.6
site.year_25AL!row!cor 1.661224e-01 0.13217975 1.2567918090 U 0.0
site.year_25NY!R 1.966660e+01 6.66496577 2.9507428747 P 0.2
site.year_25NY!range!cor 6.560056e-01 0.11568750 5.6704967625 U 0.1
site.year_25NY!row!cor 6.715315e-01 0.09137516 7.3491692384 U 0.1
If your design isn’t rectangular or there is significant missingness in the data, the autoregressive model can pose problems.
An alternative is to use splines as described by Rodríguez-Álvarez et al. (2018). This can be implemented using the sommer mixed-model R package or the SpATs packages. Splines are less susceptible to missing data and also don’t mind non-rectangular designs.
Recommend reading Isik, Holland, and Maltecca (2017) - Chapter 6.
Multiple responses variables (traits) can be jointly analyzed with a mixed-model. Such models enable the estimation of genetic correlations (genetic variance-covariance) between traits.
What is genetic correlation?
The tendency of different phenotypes to be co-inherited.
Trait variances can be decomposed into genetic and environmental components. So can covariances.
Where do genetic correlations come from?
Why do we care? What are genetic correlations “good for”?
Genetic correlations influence the direction of selection response and can either facilitate or else constrain evolutionary change in a population.
This is one version of what we call the “Breeder’s Equation” (Hill and Mackay 2004; Lande and Arnold 1983; “Animal Breeding Plans.Jay L. Lush” 1946).
In short, select response tends in the direction of genetic correlations, even if the direction of selection is opposed to the correlation. Example: if seed size and seed number or negatively genetically correlated, selection for larger seed size and number will proceed very slowly.
Practically, genetic correlations can be used to predict the effectiveness of indirect selections on one trait.
Statistical / Analytic Benefits:
Multivariate mixed-models for low heritability traits (like grain yield or biomass) typically have low selection accuracy. When there are correlated traits, potentially that are more precisely measurable and/or have higher heritability, joint multivariate modeling can increase our accuracy. By exploiting their covariances with higher heritability traits (such as size and morphology), yield or lower heritability traits can be better evaluated.
Multivariate models can accommodate the implicit structure of the selection process.
For example, in open-pollinated clover breeding, culling selection is used to remove poor performing plants before they flower. Early on, we score traits on ALL plants. Later, after we apply selection, when we record trait data we are missing data on the culled plants. In other words, the data are not missing at random but are missing more frequently for later-recorded traits.
We’ll circle back to benefits and use cases as we talk more about plant breeding and quantitative genetics.
For now, let’s look at two ways to estimate genetic variance-covariance between traits in asreml.
Consider the following equation for a mixed linear model:
The default variance of random effects is and of residuals is .
However, there are other variance-covariance functions available in asreml that can be tested. Each of these breaks the assumption of independence between components. For example:
| Model | Number Parameter | Description | asreml-R |
|---|---|---|---|
| Identity | 1 | Identical variation | id |
| Diagonal | M | Heterogeneous variations | diag |
| CS | 2 | Compound symmetry with homogeneous variance | interation model |
| CS Het | M+1 | Compound symmetry with heterogeneous variance | corh |
| AR1 | M+1 | First order autoregressive model | ar1 |
| Unstructured | M(M+1)/2 | Unstructured model | us |
Graphically:
The multivatiate mixed-model can be written by stacking univariate mixed-models on top of each other and adding covariance parameters to-be-estimated to the matrices G and R.
= vector of observations for the ith trait;
= vector of fixed effects for the ith trait;
= vector of random genetic (plant-genotype-level) effects for the ith trait;
= vector of random residual effects for the ith trait;
and = incidence matrices relating records of the ith trait to fixed and random animal effects.
Below, you can see what the V matrix looks like in this simple case of two traits. Notice the terms and plus and . These represent genetic and residual covariances that will be estimated when the model is fit. Univariate models simply assume dthese covariances = 0.
Let’s look at raw phenotypic correlations:
alny_alt %>%
select(biomass.1,fall.vigor.av,spring.vigor.av) %>%
as.matrix %>%
cor(., use='pairwise.complete.obs') %>% round(.,2) biomass.1 fall.vigor.av spring.vigor.av
biomass.1 1.00 -0.09 0.28
fall.vigor.av -0.09 1.00 0.42
spring.vigor.av 0.28 0.42 1.00
Positive phenotypic correlation between biomass.1 and spring.vigor.av.
There are two ways to accomplish a multi-trait model in asreml: “wide” and “long” form.
Wide form poses the two phenotypes as different responses and your Y vector becomes a matrix with a column for each trait. Long form stacks the traits such that Y remains a column vector.
Let’s try modeling those two traits.
NOTE: You’ll see a special term “traits” appear. Like “units”, it is a special term using by asreml. It creates an internal factor corresponding to the columns in the response matrix (the traits).
# Make factors before modeling
alny_df<-alny_alt %>%
mutate(entry=as.factor(entry),
site.year=as.factor(site.year),
site=as.factor(site),
# unless you are expecting / modeling a linear trend in year
# convert it to factor
year=as.factor(year),
# explicitly nest values of rep in site.year
# review explicit vs. implicit nesting
rep=as.factor(paste0(site.year,"-",rep)),
row=as.factor(row),
range=as.factor(range)) %>%
# ORDER IS IMPORTANT HERE
arrange(site.year,range,row)
# I started by fiddling with a simpler model
### notice the lack of residual structure
# alny_mod <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year,
# random=~us(trait):entry + at(trait):rep,
# residual=~id(units):diag(trait),
# data=alny_df, maxiter=90,
# na.action = na.method(y = "include", x = "include"))# After verifying this worked, I added back the AR1:AR1 design like so:
alny_mod <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year,
random=~us(trait):entry + at(trait):rep,
residual=~dsum(~ar1(range):ar1(row):diag(trait)|site.year),
data=alny_df, maxiter=90,
na.action = na.method(y = "include", x = "include"))ASReml Version 4.2 02/02/2026 11:50:42
LogLik Sigma2 DF wall
1 -2199.979 1.0 488 11:50:42 ( 3 restrained)
2 -1794.125 1.0 488 11:50:42 ( 1 restrained)
3 -1390.562 1.0 488 11:50:42
4 -1106.224 1.0 488 11:50:42 ( 1 restrained)
5 -970.1781 1.0 488 11:50:42 ( 1 restrained)
6 -910.9503 1.0 488 11:50:42
7 -884.2078 1.0 488 11:50:42
8 -878.1670 1.0 488 11:50:42
9 -877.1991 1.0 488 11:50:42 ( 3 restrained)
10 -877.1026 1.0 488 11:50:42 ( 3 restrained)
11 -877.0878 1.0 488 11:50:42 ( 3 restrained)
12 -877.0831 1.0 488 11:50:42 ( 3 restrained)
13 -877.0804 1.0 488 11:50:42 ( 3 restrained)
14 -877.0784 1.0 488 11:50:42 ( 3 restrained)
15 -877.0766 1.0 488 11:50:42 ( 3 restrained)
16 -877.0751 1.0 488 11:50:42 ( 3 restrained)
17 -877.0737 1.0 488 11:50:42 ( 3 restrained)
18 -877.0725 1.0 488 11:50:42 ( 3 restrained)
19 -877.0714 1.0 488 11:50:42 ( 3 restrained)
20 -877.0705 1.0 488 11:50:42 ( 3 restrained)
summary(alny_mod)$varcomp component std.error
at(trait, 'biomass.1'):rep 1.91242174 2.46175862
at(trait, 'spring.vigor.av'):rep 0.41595948 0.38948385
trait:entry!trait_biomass.1:biomass.1 2.08683064 1.26746356
trait:entry!trait_spring.vigor.av:biomass.1 1.62716420 0.56827658
trait:entry!trait_spring.vigor.av:spring.vigor.av 1.29652343 0.40546225
site.year_24AL!R 1.00000000 NA
site.year_24AL!range!cor 0.18972669 0.10983805
site.year_24AL!row!cor 0.39109275 0.09083933
site.year_24AL!trait_biomass.1 224.05796078 46.38157188
site.year_24AL!trait_spring.vigor.av 4.19921596 0.97961339
site.year_24NY!R 1.00000000 NA
site.year_24NY!range!cor -0.18099805 0.11382558
site.year_24NY!row!cor 0.05600982 0.10667289
site.year_24NY!trait_biomass.1 16.08285080 3.40914404
site.year_24NY!trait_spring.vigor.av 3.37645175 0.69649884
site.year_25AL!R 1.00000000 NA
site.year_25AL!range!cor -0.02959075 0.11795822
site.year_25AL!row!cor 0.33582993 0.09857263
site.year_25AL!trait_biomass.1 197.05919532 40.93066682
site.year_25AL!trait_spring.vigor.av 1.64576776 0.39858289
site.year_25NY!R 1.00000000 NA
site.year_25NY!range!cor 0.33616824 0.09459790
site.year_25NY!row!cor 0.30649519 0.08845506
site.year_25NY!trait_biomass.1 11.06523430 2.22217011
site.year_25NY!trait_spring.vigor.av 4.45881240 0.95744183
z.ratio bound %ch
at(trait, 'biomass.1'):rep 0.7768518 P 0.0
at(trait, 'spring.vigor.av'):rep 1.0679762 P 0.0
trait:entry!trait_biomass.1:biomass.1 1.6464620 ? 0.3
trait:entry!trait_spring.vigor.av:biomass.1 2.8633315 ? 0.1
trait:entry!trait_spring.vigor.av:spring.vigor.av 3.1976427 ? 0.1
site.year_24AL!R NA F 0.0
site.year_24AL!range!cor 1.7273312 U 0.0
site.year_24AL!row!cor 4.3053239 U 0.0
site.year_24AL!trait_biomass.1 4.8307539 P 0.0
site.year_24AL!trait_spring.vigor.av 4.2866053 P 0.0
site.year_24NY!R NA F 0.0
site.year_24NY!range!cor -1.5901351 U 0.0
site.year_24NY!row!cor 0.5250615 U 0.2
site.year_24NY!trait_biomass.1 4.7175627 P 0.0
site.year_24NY!trait_spring.vigor.av 4.8477493 P 0.0
site.year_25AL!R NA F 0.0
site.year_25AL!range!cor -0.2508579 U 0.2
site.year_25AL!row!cor 3.4069288 U 0.0
site.year_25AL!trait_biomass.1 4.8144634 P 0.0
site.year_25AL!trait_spring.vigor.av 4.1290477 P 0.1
site.year_25NY!R NA F 0.0
site.year_25NY!range!cor 3.5536545 U 0.0
site.year_25NY!row!cor 3.4649821 U 0.1
site.year_25NY!trait_biomass.1 4.9794722 P 0.0
site.year_25NY!trait_spring.vigor.av 4.6570060 P 0.0
What you should notice is that the variance and covariance estimates are all listed in a data.frame.
Unfortunately, I’m unaware of a convenience function that makes it into a simple var-covar matrix.
Here’s my hack:
# Extract only the estimates
g11<-summary(alny_mod)$varcomp %>% .[grepl("biomass.1:biomass.1",rownames(.)),"component"]
g12<-g21<-summary(alny_mod)$varcomp %>% .[grepl("spring.vigor.av:biomass.1",rownames(.)),"component"]
g22<-summary(alny_mod)$varcomp %>% .[grepl("spring.vigor.av:spring.vigor.av",rownames(.)),"component"]
# Manually construct a matrix
G<-matrix(c(g11,g12,g21,g22),
2, 2, byrow=TRUE,
dimnames=list(c("biomass.1","spring.vigor.av"),
c("biomass.1","spring.vigor.av")))
G biomass.1 spring.vigor.av
biomass.1 2.086831 1.627164
spring.vigor.av 1.627164 1.296523
Convert to correlations
cov2cor(G) biomass.1 spring.vigor.av
biomass.1 1.0000000 0.9892307
spring.vigor.av 0.9892307 1.0000000
The BLUPs can be extracted same as before.
They do require a bit of clean-up to their names.
The code below has a bunch of steps but all I’m doing is refactoring the output to be a nice looking data.frame.
blups<-summary(alny_mod,coef=T)$coef.random %>%
# confert to data.frame
as.data.frame %>%
# change the rownames to a column the the data.frame
rownames_to_column(var = "entry") %>%
# filter to only the BLUPs for the entry main-effect
# (output includes all random terms)
filter(grepl("trait_",entry),
grepl(":entry_",entry)) %>%
# Clean up the entry names by remove prefixes
mutate(entry=gsub("trait_","",entry),
entry=gsub("entry_","",entry)) %>%
# Separate the entry name, it contains both trait and entry names now
separate(entry,c("Trait","entry"),sep = ":") %>%
select(Trait,entry,solution) %>%
# Convert from long to wide (put the trait BLUPs next to each other as columns)
spread(Trait,solution)
blups %>% head entry biomass.1 spring.vigor.av
1 17MDCCH 0.2290380 0.1682749
2 17MDCCS 0.6700303 0.5380645
3 17RMCCL -1.3385127 -1.0287830
4 18NCCCEGiant 0.5832913 0.4780943
5 19MDCC 1.3092683 1.0629740
6 19NCCCLH 0.1288877 0.1047911
blups %>%
ggplot(.,aes(x=spring.vigor.av,y=biomass.1)) + geom_point() + labs(title='BLUPs from Multivar Model')alny_alt %>%
ggplot(.,aes(x=spring.vigor.av,y=biomass.1)) + geom_point()Last thing I want to show here is model comparison.
You can’t directly compare a uni-variate to a multivariate model. They are different datasets, after all.
Instead, first fit a multi-trait model with the us(traits) variance structure. This estimates the genetic variance-covariances.
Next, fit the same model, but change to diag(traits). This is equivalent to model the two traits separately.
A LRT and/or AIC comparison can then be made.
alny_mod_diagtraits <- asreml(cbind(biomass.1,spring.vigor.av)~trait+trait:site.year,
random=~diag(trait):entry + at(trait):rep,
residual=~dsum(~ar1(range):ar1(row):diag(trait)|site.year),
data=alny_df, maxiter=90,
na.action = na.method(y = "include", x = "include"))ASReml Version 4.2 02/02/2026 11:50:42
LogLik Sigma2 DF wall
1 -2189.708 1.0 488 11:50:42 ( 3 restrained)
2 -1784.732 1.0 488 11:50:42
3 -1387.722 1.0 488 11:50:42
4 -1107.703 1.0 488 11:50:42 ( 1 restrained)
5 -973.2809 1.0 488 11:50:42
6 -905.2485 1.0 488 11:50:42
7 -886.2921 1.0 488 11:50:42
8 -882.6019 1.0 488 11:50:42
9 -882.1435 1.0 488 11:50:42
10 -882.0897 1.0 488 11:50:42
11 -882.0815 1.0 488 11:50:42
12 -882.0800 1.0 488 11:50:42
summary(alny_mod_diagtraits)$varcomp component std.error z.ratio
at(trait, 'biomass.1'):rep 1.711014e+00 2.3573054 0.725834613
at(trait, 'spring.vigor.av'):rep 4.182708e-01 0.3905696 1.070925268
trait:entry!trait_biomass.1 1.484542e+00 1.0618270 1.398101335
trait:entry!trait_spring.vigor.av 1.275245e+00 0.4038857 3.157440333
site.year_24AL!R 1.000000e+00 NA NA
site.year_24AL!range!cor 1.934024e-01 0.1119265 1.727941004
site.year_24AL!row!cor 4.160064e-01 0.0902496 4.609509889
site.year_24AL!trait_biomass.1 2.358932e+02 49.9376628 4.723752971
site.year_24AL!trait_spring.vigor.av 3.955565e+00 0.9331135 4.239103551
site.year_24NY!R 1.000000e+00 NA NA
site.year_24NY!range!cor -2.024806e-01 0.1137123 -1.780639558
site.year_24NY!row!cor 7.290569e-04 0.1070799 0.006808531
site.year_24NY!trait_biomass.1 1.539425e+01 3.3366494 4.613686149
site.year_24NY!trait_spring.vigor.av 3.630845e+00 0.7505270 4.837727606
site.year_25AL!R 1.000000e+00 NA NA
site.year_25AL!range!cor -2.124033e-02 0.1157803 -0.183453782
site.year_25AL!row!cor 3.054894e-01 0.1015944 3.006951174
site.year_25AL!trait_biomass.1 2.170321e+02 44.8714581 4.836752469
site.year_25AL!trait_spring.vigor.av 1.561906e+00 0.3842398 4.064924788
site.year_25NY!R 1.000000e+00 NA NA
site.year_25NY!range!cor 2.822685e-01 0.1037768 2.719957614
site.year_25NY!row!cor 4.109012e-01 0.0852862 4.817909476
site.year_25NY!trait_biomass.1 1.068695e+01 2.2224356 4.808666390
site.year_25NY!trait_spring.vigor.av 4.841211e+00 1.0831756 4.469461502
bound %ch
at(trait, 'biomass.1'):rep P 0.9
at(trait, 'spring.vigor.av'):rep P 0.1
trait:entry!trait_biomass.1 P 0.3
trait:entry!trait_spring.vigor.av P 0.1
site.year_24AL!R F 0.0
site.year_24AL!range!cor U 0.0
site.year_24AL!row!cor U 0.0
site.year_24AL!trait_biomass.1 P 0.0
site.year_24AL!trait_spring.vigor.av P 0.0
site.year_24NY!R F 0.0
site.year_24NY!range!cor U 0.0
site.year_24NY!row!cor U 21.6
site.year_24NY!trait_biomass.1 P 0.1
site.year_24NY!trait_spring.vigor.av P 0.0
site.year_25AL!R F 0.0
site.year_25AL!range!cor U 0.5
site.year_25AL!row!cor U 0.1
site.year_25AL!trait_biomass.1 P 0.0
site.year_25AL!trait_spring.vigor.av P 0.1
site.year_25NY!R F 0.0
site.year_25NY!range!cor U 0.5
site.year_25NY!row!cor U 0.3
site.year_25NY!trait_biomass.1 P 0.1
site.year_25NY!trait_spring.vigor.av P 0.2
lrt(alny_mod_diagtraits,alny_mod)Likelihood ratio test(s) assuming nested random models.
(See Self & Liang, 1987)
df LR-statistic Pr(Chisq)
alny_mod/alny_mod_diagtraits 1 10.019 0.0007746 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(alny_mod)$aic[1] 1796.141
attr(,"parameters")
[1] 21
summary(alny_mod_diagtraits)$aic[1] 1804.16
attr(,"parameters")
[1] 20
We have a clear result. Including the genetic covariance between traits leads to a significantly better model fit.
You can declare those covariance components different from zero.
alny_long<-alny_alt %>%
pivot_longer(cols = c(biomass.1, spring.vigor.av),
names_to = "trait",
values_to = "value",values_drop_na = T) %>%
mutate(trait = factor(trait),
entry=as.factor(entry),
site.year=as.factor(site.year),
site=as.factor(site),
# unless you are expecting / modeling a linear trend in year
# convert it to factor
year=as.factor(year),
# explicitly nest values of rep in site.year
# review explicit vs. implicit nesting
rep=as.factor(paste0(site.year,"-",rep)),
row=as.factor(row),
range=as.factor(range)) %>%
arrange(site.year, range, row, trait)
alny_long %>% select(site.year,entry,rep,row,range,trait,value) %>% head# A tibble: 6 × 7
site.year entry rep row range trait value
<fct> <fct> <fct> <fct> <fct> <fct> <dbl>
1 24AL Aldo 24AL-24AL-1 1 1 biomass.1 56.3
2 24AL Aldo 24AL-24AL-1 1 1 spring.vigor.av 9
3 24AL 21NCCCLH 24AL-24AL-1 2 1 biomass.1 46.4
4 24AL 21NCCCLH 24AL-24AL-1 2 1 spring.vigor.av 5
5 24AL 23MDCC 24AL-24AL-1 3 1 biomass.1 55.8
6 24AL 23MDCC 24AL-24AL-1 3 1 spring.vigor.av 5
Key points:
value = NA.alny_long_mod <- asreml(
fixed = value ~ trait + trait:site.year,
random = ~ us(trait):entry + us(trait):rep,
# residual = ~ dsum(~ar1(range):ar1(row):diag(trait) | site.year),
data = alny_long,
maxiter = 90,
na.action = na.method(y="include", x="include"))ASReml Version 4.2 02/02/2026 11:50:42
LogLik Sigma2 DF wall
1 -1243.737 49.90869 488 11:50:42 ( 5 restrained)
2 -1239.542 48.94557 488 11:50:42 ( 4 restrained)
3 -1231.443 49.06078 488 11:50:42 ( 6 restrained)
4 -1228.730 47.39486 488 11:50:42 ( 4 restrained)
5 -1228.725 47.38316 488 11:50:42 ( 4 restrained)
6 -1227.820 47.01267 488 11:50:42 ( 4 restrained)
7 -1227.818 47.01039 488 11:50:42 ( 6 restrained)
8 -1227.317 46.86020 488 11:50:42 ( 4 restrained)
9 -1227.316 46.85783 488 11:50:42 ( 6 restrained)
10 -1227.006 46.77536 488 11:50:42 ( 4 restrained)
11 -1227.006 46.77254 488 11:50:42 ( 6 restrained)
12 -1226.800 46.72221 488 11:50:42 ( 4 restrained)
13 -1226.800 46.71891 488 11:50:42 ( 6 restrained)
14 -1226.656 46.68684 488 11:50:42 ( 4 restrained)
15 -1226.656 46.68300 488 11:50:42 ( 6 restrained)
16 -1226.551 46.66226 488 11:50:42 ( 6 restrained)
17 -1226.551 46.65852 488 11:50:42 ( 6 restrained)
18 -1226.473 46.64214 488 11:50:42 ( 5 restrained)
19 -1226.428 46.79242 488 11:50:42 ( 6 restrained)
20 -1226.365 46.63227 488 11:50:42 ( 5 restrained)
21 -1226.317 46.78490 488 11:50:42 ( 6 restrained)
22 -1226.272 46.61667 488 11:50:42 ( 5 restrained)
23 -1226.220 46.77570 488 11:50:42 ( 6 restrained)
24 -1226.188 46.60401 488 11:50:42 ( 6 restrained)
25 -1226.178 46.58351 488 11:50:42 ( 6 restrained)
26 -1226.169 46.58424 488 11:50:42 ( 6 restrained)
27 -1226.161 46.57983 488 11:50:42 ( 6 restrained)
28 -1226.154 46.57973 488 11:50:42 ( 6 restrained)
29 -1226.147 46.57706 488 11:50:42 ( 6 restrained)
30 -1226.141 46.57632 488 11:50:42 ( 6 restrained)
31 -1226.135 46.57450 488 11:50:42 ( 6 restrained)
32 -1226.130 46.57358 488 11:50:42 ( 6 restrained)
33 -1226.125 46.57222 488 11:50:42 ( 6 restrained)
34 -1226.120 46.57129 488 11:50:42 ( 6 restrained)
35 -1226.116 46.57021 488 11:50:42 ( 6 restrained)
36 -1226.111 46.56934 488 11:50:42 ( 6 restrained)
37 -1226.107 46.56844 488 11:50:42 ( 6 restrained)
38 -1226.104 46.56765 488 11:50:42 ( 6 restrained)
39 -1226.100 46.56688 488 11:50:42 ( 6 restrained)
40 -1226.097 46.56618 488 11:50:42 ( 6 restrained)
41 -1226.094 46.56550 488 11:50:42 ( 6 restrained)
42 -1226.091 46.56488 488 11:50:42 ( 6 restrained)
43 -1226.088 46.56428 488 11:50:42 ( 6 restrained)
44 -1226.086 46.56372 488 11:50:42 ( 6 restrained)
45 -1226.083 46.56319 488 11:50:42 ( 6 restrained)
46 -1226.081 46.56269 488 11:50:42 ( 6 restrained)
47 -1226.079 46.56221 488 11:50:42 ( 6 restrained)
48 -1226.077 46.56176 488 11:50:42 ( 6 restrained)
49 -1226.075 46.56133 488 11:50:42 ( 6 restrained)
50 -1226.073 46.56092 488 11:50:42 ( 6 restrained)
51 -1226.071 46.56053 488 11:50:42 ( 6 restrained)
52 -1226.069 46.56016 488 11:50:42 ( 6 restrained)
53 -1226.067 46.55980 488 11:50:42 ( 6 restrained)
54 -1226.066 46.55947 488 11:50:42 ( 6 restrained)
55 -1226.064 46.55914 488 11:50:42 ( 6 restrained)
56 -1226.063 46.55883 488 11:50:43 ( 6 restrained)
57 -1226.061 46.55854 488 11:50:43 ( 6 restrained)
58 -1226.060 46.55826 488 11:50:43 ( 6 restrained)
59 -1226.058 46.55798 488 11:50:43 ( 6 restrained)
60 -1226.057 46.55772 488 11:50:43 ( 6 restrained)
61 -1226.056 46.55747 488 11:50:43 ( 6 restrained)
62 -1226.055 46.55723 488 11:50:43 ( 6 restrained)
63 -1226.054 46.55700 488 11:50:43 ( 6 restrained)
64 -1226.052 46.55678 488 11:50:43 ( 6 restrained)
65 -1226.051 46.55656 488 11:50:43 ( 6 restrained)
66 -1226.050 46.55635 488 11:50:43 ( 6 restrained)
67 -1226.049 46.55616 488 11:50:43 ( 6 restrained)
68 -1226.048 46.55596 488 11:50:43 ( 6 restrained)
69 -1226.047 46.55578 488 11:50:43 ( 6 restrained)
70 -1226.046 46.55560 488 11:50:43 ( 6 restrained)
71 -1226.046 46.55542 488 11:50:43 ( 6 restrained)
72 -1226.045 46.55526 488 11:50:43 ( 6 restrained)
73 -1226.044 46.55509 488 11:50:43 ( 6 restrained)
74 -1226.043 46.55494 488 11:50:43 ( 6 restrained)
75 -1226.042 46.55479 488 11:50:43 ( 6 restrained)
76 -1226.042 46.55464 488 11:50:43 ( 6 restrained)
77 -1226.041 46.55450 488 11:50:43 ( 6 restrained)
78 -1226.040 46.55436 488 11:50:43 ( 6 restrained)
79 -1226.039 46.55422 488 11:50:43 ( 6 restrained)
80 -1226.039 46.55409 488 11:50:43 ( 6 restrained)
81 -1226.038 46.55397 488 11:50:43 ( 6 restrained)
82 -1226.037 46.55384 488 11:50:43 ( 6 restrained)
83 -1226.037 46.55372 488 11:50:43 ( 6 restrained)
84 -1226.036 46.55361 488 11:50:43 ( 6 restrained)
85 -1226.036 46.55350 488 11:50:43 ( 6 restrained)
86 -1226.035 46.55339 488 11:50:43 ( 6 restrained)
87 -1226.034 46.55328 488 11:50:43 ( 6 restrained)
88 -1226.034 46.55318 488 11:50:43 ( 6 restrained)
89 -1226.033 46.55307 488 11:50:43 ( 6 restrained)
90 -1226.033 46.55298 488 11:50:43 ( 6 restrained)
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 1 times in iteration 1 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 3 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 1 times in iteration 5 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 7 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Warning : US updates modified 2 times in iteration 9 to
remain positive definite.
Warning in asreml(fixed = value ~ trait + trait:site.year, random =
~us(trait):entry + : Log-likelihood not converged
Modeling this way isn’t going so well.
With the residual term specified, I received a warning that there were singularities and the model could not be solved.
Remove the residual term and we get the model to fit… but it says NOT CONVERGED.
Still, we are left with an output I can show you as an example:
summary(alny_long_mod)$varcomp component std.error
trait:rep!trait_biomass.1:biomass.1 5.58549117 5.034742
trait:rep!trait_spring.vigor.av:biomass.1 -0.31930658 2.104452
trait:rep!trait_spring.vigor.av:spring.vigor.av 0.01977474 2.106323
trait:entry!trait_biomass.1:biomass.1 20.39086187 6.536167
trait:entry!trait_spring.vigor.av:biomass.1 3.05569762 3.448362
trait:entry!trait_spring.vigor.av:spring.vigor.av 0.46627054 6.900820
units!R 46.55297587 3.135684
z.ratio bound %ch
trait:rep!trait_biomass.1:biomass.1 1.109389841 ? 0.0
trait:rep!trait_spring.vigor.av:biomass.1 -0.151729116 ? 0.3
trait:rep!trait_spring.vigor.av:spring.vigor.av 0.009388277 ? 0.6
trait:entry!trait_biomass.1:biomass.1 3.119697201 ? 0.0
trait:entry!trait_spring.vigor.av:biomass.1 0.886130122 ? 0.1
trait:entry!trait_spring.vigor.av:spring.vigor.av 0.067567412 ? 0.2
units!R 14.846196185 P 0.0
NOTICE: Multivariate models are computationally challenging. Memory requirements for REML increases (I think) with the cube of the number of parameters. In other words, adding more traits very rapidly increases compute demand. REML sometimes will not converge to a global optimum. Preliminary analyses to estimate “starting values” for REML algorithms to use are often needed.
Some guidance:
asreml or your favorite solver as “starting values” to increase the chance of convergence.Section below is preparatory for your downstream goals (GWAS and GP).
We often have very large, multi-year, multi-location, multi-trial-type (MET) datasets we use for downstream genome-wide association and genomic prediction purposes. The number of plots can be in the range of 10- or even 100,000 plots with many thousands of unique genotypes observed in unbalanced fashion across heterogenous experimental designs. All that to say, the computational burden and level of expertise required to execute such analyses directly on these data is very great.
Because they focus on computational efficiency, many software packages used for GWAS (e.g. FarmCPU Liu et al. (2016)) limit the fixed and random-effects structures that can be modeled.
In contrast, packages like asreml and sommer can be used for GWAS, but are optimized to do so.
While a unified, single-step analysis is always the statistically best approach, computational burdens still often necessitate a “two-stage” analysis approach:
Stage 1: Model the raw data and obtain ‘summary values’ (at the level of genetic units, e.g. for each genotype in your population), adjusted for the complexities of experimental design.
Stage 2: Conduct inference or prediction analysis (e.g. GWAS) using the adjusted genotypic values from Stage 1 as the response.
Options for Stage 1:
Despite the benefits of BLUP for ranking and selection, you should NEVER BLUP TWICE (highly recommend you read Holland and Piepho (2024) also Garrick, Taylor, and Fernando (2009)).
Why not? In downstream GWAS and GP applications, we usually fit a random-effect for genotypes (lines in your population). We’ll talk about why and how later. The problem is, if your dependent-variable in the GWAS analysis is already a shrinkage-based prediction, you’ll shrink the distribution twice and lose at least some of the signal you are looking for.
As depicted in the figure below, you’ll want to keep lines as fixed-effects, i.e. use BLUEs from Stage 1 for your GWAS/GP analyses.
Stage 2 models, without weighting, assume independent and identical errors among the adjusted means being used as the response. However, real and unbalanced data mean there is real information on error variance-covariance among the adjusted-genotype-means from Stage 1 being lost.
There are several ways of preserving that information, as “weights” applied to the residual in Stage 2 Damesa et al. (2017) Fernández-González and Isidro y Sánchez (2025).
I’ll show you an example of a weight scheme and how it works below.
But first a digression on de-regression.
It is possible to fit a mixed-model with genotypes as random and then later un-shrink them. There are some good reasons to do so.
Deregression is an approach that is often used in animal breeding (Garrick, Taylor, and Fernando 2009; Holland and Piepho 2024; Isik, Holland, and Maltecca 2017). BLUPs get un-shrunken before submission to the next stage of analysis.
The de-regressed BLUP (drgBLUP) is simply the BLUP divided by a quantity called the reliability or . It is very similar (possibly equivalent) to the BLUE.
The reliability is defined as:
So
Recall that reliability ranges from 0 to 1 and measures the certainty surrounding each BLUP value; or rather, the probability the BLUP would change if another experiment (data point) were added. So if , then we have essentially zero expected error for a BLUP. For example, check varieties often have so many observations in a dataset that they will have high reliability.
So if a de-regressed BLUP is defined as you can see that for clones with high reliability, the BLUP will stay essentially the same, but if the reliability is low, the BLUP will be un-shrink or de-regress strongly. This is going to happen because genotypes with few observations and thus low reliability will be more strongly shrunken to zero during the first step.
Below is a density plot showing the distribution of the original data compared to the BLUP and the drgBLUP. This is meant to show how shrinkage and unshrinkage effect the data. Since BLUPs and drgBLUPs are centered on the mean, I added back the grand mean for comparison to the original data.
Regardless of whether BLUE or drg-BLUP are used in the 2nd stage, if data are unbalanced and/or come from complex data, then the precision of BLUE/drgBLUP is likely to vary. It has been shown to improve accuracy in the 2nd stage to include a weight to the R-structure (residual var-covar) according to results of the first stage.
The best way to do that accounts for both variances and covariances among the responses in the second stage; but it is very computationally intensive (Möhring and Piepho 2009; Damesa et al. 2017). Instead, “diagonal” approaches are easier. The R structure in stage 2 includes . Where w is a vector of “weights”. There are different options for weights.
Garrick, Taylor, and Fernando (2009) recommended weight is:
is broad-sense heritability.
is the reliability.
FINAL NOTE ON WEIGHTED ANALYSIS: This is an area that deserves consideration and is actively under discussion / development in the literature. Check-out this very recent article Fernández-González and Isidro y Sánchez (2025).
We didn’t cover the downstream applications yet. For that reason, you’ll forgive me for saving an example for once we’re doing Stage 2.
Now it’s your turn!
Our goal remains to jointly analyze the entire ALT dataset and to produce estimates of genetic parameters (heritability) and rankings of the entries (BLUPs).
For this lesson’s activity, you have two added objectives:
Use appropriate model comparison procedures to determine the optimal fit for your data.
As before, DATA OPTIONS: